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ABSTRACT 

Precision measurements of the galaxy power spectrum P{k) require a data analysis pipeline that is both 
fast enough to be computationally feasible and accurate enough to take full advantage of high-quality 
data. We present a rigorous discussion of different methods of power spectrum estimation, with emphasis 
on the traditional Fourier method, and linear (Karhunen-Loeve; KL), and quadratic data compression 
schemes, showing in what approximations they give the same result. To improve speed, we show how 
many of the advantages of KL data compression and power spectrum estimation may be achieved with 
a computationally faster quadratic method. To improve accuracy, we derive analytic expressions for 
handling the integral constraint, since it is crucial that finite volume effects are accurately corrected 
for on scales comparable to the depth of the survey. We also show that for the KL and quadratic 
techniques, multiple constraints can be included via simple matrix operations, thereby rendering the 
results less sensitive to galactic extinction and mis-estimates of the radial selection function. We present 
a data analysis pipeline that we argue does justice to the increases in both quality and quantity of 
data that upcoming redshift surveys will provide. It uses three analysis techniques in conjunction: a 
traditional Fourier approach on small scales, a pixelized quadratic matrix method on large scales and a 
pixelized KL eigenmode analysis to probe anisotropic effects such as redshift-space distortions. 



1. INTRODUCTION 

Observational data on galaxy clustering are rapidly in- 
creasing in both quantity and quality, which brings new 
challenges to data analysis. As for quantity, redshifts had 
been published for a few thousand galaxies 15 years ago. 
Today the number is ~ 10 5 (Huchra, private communica- 
tion), and ongoing projects such as the AAT 2dF Survey 
and the Sloan Digital Sky Survey (hereafter SDSS; see 
Gunn & Weinberg 1995) will raise it to 10 6 in a few years. 
Comprehensive reviews of past redshift surveys are given 
by Efstathiou (1994), Vogeley (1995), Strauss & Willick 
(1995) and Strauss (1997), the last also including a de- 
tailed description of 2dF and SDSS. As for quality, more 
accurate and uniform photometric selection criteria (en- 
abled by e.g. the well-calibrated 5-band photometry of the 
SDSS) reduce potential systematic errors. This increased 
data quality makes it desirable to avoid approximations 
in the data analysis process and to use methods that can 
constrain cosmological quantities as accurately as possible. 



This is especially important since a wide variety of mod- 
els currently appear to be at least marginally consistent 
with the present data (Peacock 1997; White et al. 1996; 
Vogeley 1997), so smaller error bars on the power spec- 
trum will be needed to discriminate between them. How- 
ever, the increased data quantity makes it a real challenge 
to perform such an accurate analysis; as we will discuss 
at some length, a straightforward application of any such 
method is computationally unfeasible for data sets as large 
as those from 2dF and SDSS. 

This paper addresses both the accuracy and speed issues 
for measuring the galaxy power spectrum P(k). For the 
highest accuracy, we advocate the use of lossless pixelized 
data compression methods, both linear (Karhunen-Loeve) 
and quadratic. To improve the speed, we present a fast 
implementation of a pixelized quadratic power estimation 
method, and show how it can reproduce Karhunen-Loeve 
results (Vogeley & Szalay 1996; Tegmark, Taylor & Heav- 
ens 1997 — hereafter TTH) exactly, without the need to 
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solve eigenvalue problems, although it does not have the 
ability to measure redshift-space distortions. 

The rest of the paper is organized as follows. Section || is 
a "buyers guide" , where we list the various properties that 
are important when deciding which data analysis method 
to use. In Section [?| we review the existing methods for 
power spectrum estimation in a common framework, make 
various extensions of them and present the fast quadratic 
data compression and power spectrum technique. Sec- 
tion ^ describes how the various methods are interrelated, 
and Section || discusses how they can be immunized from 
various systematic effects, such as errors in the extinction 
model; many of the technical details are given in Appen- 
dices A and B. Section ^ discusses the pros and cons of the 
different techniques. We conclude that to meet all criteria 
on the wish list of Section ^, it is necessary to combine 
three of the principal methods described. The data anal- 
ysis pipeline that we propose is summarized in Figure 1, 
and the reader may wish to glance at this before delving 
into the details of Section [| to obtain an overview of how 
everything fits together. We ignore redshift-space distor- 
tions for most of this paper, and assume that clustering is 
isotropic. In Section o and Appendix C, we discuss this 
issue further. 



2. COMPARING METHODS: A WISH LIST 

Since the ultimate goal of the analysis of clustering in 
a galaxy redshift survey is to constrain cosmological mod- 
els, we want a method that minimizes the statistical error 
bars on cosmological parameters 1 and is robust against po- 
tential systematic errors. We will now discuss the former 
issue in some detail, and return to the latter in Section |^ 
and Appendix B. 

2.1. The problem to be solved 

The data set from a galaxy redshift survey consists 
of N three-dimensional vectors r a (a = 1,...,N) giving 
the measured positions of the galaxies. Following Pee- 
bles (1973, 1980), we model these positions as generated 
by a random Poissonian point process where the galaxy 
density is modulated by both selection effects and fluctu- 
ations in the underlying matter distribution. The former 
are described by n(r), the selection function of the galaxy 
survey under consideration, defined as the expected galaxy 
density. Thus n(v)dV is the expected (not the observed) 
number of galaxies in a volume dV about r in the absence 
of clustering. This function typically falls off at large dis- 
tances, and can exhibit small angular variations due to 
extinction. It vanishes outside the survey volume. The 
fluctuations in the underlying matter density are given by 
the field 5 r (r) , which is not to be confused with the Dirac 
delta function S D . This means that the observed galaxy 
distribution n(r) = J2 a ^ D ( r ~ r ") i s modeled as a 3D 
Poisson process with intensity A(r) = n(r)[l + <5 r (r)]- 

The density fluctuations S r are modeled as a homoge- 
neous and isotropic (but not necessarily Gaussian) random 

field. This implies that the Fourier transform 5 r of the 

1 This is not a mere unimportant detail, since doubling the error bars (which an inferior method can easily do) is comparable to reducing 
the survey volume and the number of galaxies probed by a factor of four. 

2 Note that equation (|l|) only holds if positions are measured in real space. Redshift-space distortions couple modes of different k; see 
Appendix C for more details. 



density fluctuation field obeys the simple relation 

(? r (k)*? r (k')> = (2vr) 3 ^(k - k')P(k) (1) 

for some function P which depends only on the magni- 
tude of k, not on its direction 2 P is known as the power 
spectrum. Because of equation ([!]) , P contains all the infor- 
mation needed to compute any statistical quantities that 
depend quadratically on <5 r , for instance the variance or 
the correlation function on different scales. Moreover, if 
the random field S r is Gaussian (which means that the 
joint probability distribution of 5 r at any number of points 
is a multivariate Gaussian), then P characterizes 5 r com- 
pletely and contains all the information needed to answer 
any statistical question whatsoever about 5 r . Inflationary 
theory (e.g., Peebles 1993) and observations of large-scale 
structure (e.g., Strauss & Willick 1995) imply that 8 r is 
Gaussian on large scales, making P an important quantity 
in cosmology. The power spectrum estimation problem, 
which is the topic of this paper, is to estimate P{k) given 
the observed realization of n(r). 

2.2. The traditional approach 

Let us parameterize the power spectrum P{k) by some 
set of parameters 9i,i — 1,2,..., grouped into a vector 
©. These may be either the band powers in a set of nar- 
row bands, or physically motivated parameters such as the 
normalization as, the shape parameter T, the primordial 
spectral index n, etc. Let us package our data set into a 
vector x; much of the distinction between different meth- 
ods discussed in Section || lies in the way this packaging 
is done. The standard approach to parameter estimation 
is to write down the expression for the probability dis- 
tribution /(x;0). Here we interpret / as a probability 
distribution over x for a fixed 0. In a Bayesian statistical 
analysis with a uniform prior probability distribution for 
0, one reinterprets / as a probability distribution over © 
for a given data set x, and to clarify this distinction re- 
names / the likelihood function. The final results are often 
presented as contour plots of this likelihood function, as 
at the bottom of Figure 1. 

If we take x to be the raw data set, i.e., the measured 
coordinates r a (a = 1, N) of the N measured galaxies, 
then the likelihood function / is unfortunately hopeless to 
compute numerically, since it involves the A-point correla- 
tion function. Even in the Gaussian approximation that / 
is given by a product over two-point correlation functions 
(e.g., White 1979; Fry 1984), this requires evaluating a 
multivariate polynomial of degree N/2 in the correlations 
of the N(N+1) /2 galaxy pairs, and the CPU time required 
for this grows faster than exponentially with A (Dodelson, 
Hui, & Jaffe 1997). The traditional approach has therefore 
been to take x to be something else: band-power estimates 
of the power spectrum. These are essentially computed 
by multiplying the observed density field by some weight 
function, Fourier transforming it, taking the squared mod- 
ulus ofthe result and averaging over shells in fc-space (Sec- 
tion 3.3). In the (sometimes poor) approximation that the 
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probability distribution for x is a multivariate Gaussian, 
its probability distribution is 



/(x;0)oc |C| 



-l/2 e -i(x-m)tC- I (x-m) 



(2) 



where |C| denotes the determinant of C, and the mean vec- 
tor m = (x) and the covariance matrix C = (xx') — mm^ 
depend on P(k) and hence on the unknown parameters 
0. Much of the model testing to date has been rather ap- 
proximate, often little more than a "x-by-eye" fit of the- 
oretical power spectra to the data, which is tantamount 
to ignoring the correlations between the power estimates 
(the off-diagonal elements of C). This approach clearly 
does not utilize all the information present in the data, 
and can also bias the results. 

If we had infinite computer resources, we would improve 
the situation by simply performing an exact brute force 
likelihood analysis on the raw data set. Is there a faster 
way of obtaining the same result? 

2.3. The notion of a lossless method 

We will call a method for analyzing a data set unbeatable 
or optimal if no other method can place tighter constraints 
on cosmological models (as parameterized by 0) using this 
data. In this paper, we are focussed on the power spec- 
trum measured in real space, and so we restrict ourselves 
to measurements of the parameters which determine the 
power spectrum itself. 

2.3.1. The Fisher Information Matrix 

This can be made precise using the formalism of the 
Fisher information matrix (see TTH for a comprehensive 
review of this application), which offers a simple and a 
useful way of measuring how much information each step 
in the pipeline of Figure 1 destroys. Given any set of cos- 
mological parameters of interest denoted 9i, i = 1,2,..., 
their Fisher matrix F gives the smallest error bars with 
which the parameters can possibly be measured from a 
given data set. If the probability distribution for the data 
set given the parameter values is /(x; 0), then the Fisher 
matrix is defined by (Fisher 1935) 



d 2 In/ 



(3) 



Crudely speaking, F _1 can be thought of as the best pos- 
sible covariance matrix for the measurement errors on the 
parameters. Indeed, the Cramer-Rao inequality (Ken- 
ney & Keeping 1951; Kendall & Stuart 1969) states that 
no unbiased method whatsoever can measure the i th pa- 
rameter with error bars (standard deviation) less than 
l/\/Pii' If the other parameters are not known but are 
estimated from the data as well, the minimum standard 
deviation rises to (F -1 )]/ 2 . This formalism has recently 
been used to assess the accuracy with which cosmological 
parameters can be measured from future galaxy surveys 
(Tegmark 1997b; Goldberg & Strauss 1998; Hu, Eisen- 
stein, & Tegmark 1997) and cosmic microwave background 
experiments (Jungman et al. 1996; Bond, Efstathiou, & 
Tegmark 1997; Zaldarriaga, Seljak, & Spergel 1997). 

2.3.2. Checking for leaks in the pipeline 



By computing the Fisher matrix separately from each 
of the intermediate data sets in Figure 1, we can track the 
flow of information down the data pipeline and check for 
leaks. For instance, if the Fisher matrix computed from 
the raw positional data is identical to that computed from 
the (much smaller) data set consisting of the band power 
estimates, then the power spectrum estimation method is 
lossless in the sense that no information about the param- 
eters of interest has been lost in the process of compressing 
the data set from, say, 10 6 numbers down to 50; cf., the 
discussion in Section 2.4. We will use this criterion when 



comparing different data analysis techniques below. 

2.3.3. The power spectrum Fisher matrix 

Whether a method is lossless or not generally depends 
on which parameters we are interested in estimating. For- 
tunately, as shown by Tegmark (1997a, hereafter T97), 
certain methods can be shown to be lossless for any set 
of parameters in a large class. An important special case 
are quantities that parameterize the power spectrum P{k), 
such as erg and T; all the information on these parameters is 
retained if the power spectrum itself (parameterized by the 
power in many narrow bands) can be measured with the 
minimal error bars. This means that one can test whether 
a method is lossless simply by computing the Fisher matrix 
for the band powers. This also means that band powers 
have a special status compared to other parameters: if we 
simply measure P (k) as accurately as possible, this mea- 
sured function will retain all the information about all cos- 
mological parameters. All this is of course only true within 
the framework of Gaussian and isotropic models, which are 
uniquely characterized by their power spectrum. Specifi- 
cally, these methods do lose information about parameters 
that affect the data set not only via P(k). Important ex- 
amples to which we will return are redshift-space distor- 
tions and uncorrected galactic extinction, both of which 
introduce differences between the angular and radial clus- 
tering patterns. On small scales, nonlinear clustering cre- 
ates non-Gaussian fluctuations where in addition to P(k), 
higher order moments also contain cosmological informa- 
tion. 



2.4. Data compression, simplicity and 

A second and rather obvious criterion for comparing 
data analysis methods is their numerical feasibility. For in- 
stance, the brute-force likelihood analysis of the raw data 
set (the galaxy positions) described above is lossless, but 
too time-consuming to be numerically feasible when N, 
the number of galaxies, is large. 

When the brute-force method is unfeasible, the general 
approach is to perform some form of data compression, 
whereby the data set is reduced to a smaller and simpler 
one which is easier to analyze. If the data compression 
step is lossless, a brute-force analysis on the compressed 
data set clearly gives just as small error bars as one on the 
raw data would have done. 

To facilitate parameter estimation further down the 
pipeline, it is useful if the statistical properties of the power 
spectrum estimates are simple and easy to calculate. The 
simplest case (which often occurs with the pixelized meth- 
ods described below) is that where the data set is a mul- 
tivariate Gaussian, described by the likelihood function 
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of equation (g). Then the slowest step in the likelihood 
calculation is computing the determinant of the N x N 
covariance matrix C, for which the CPU time scales as 
N 3 , so it is desirable to make the compressed data set as 
small as is possible without losing information 3 . It is also 
convenient if the statistical properties of the compressed 
data set, in particular, its covariance matrix, can be com- 
puted analytically. We will see that this is the case for the 
Karhunen-Loeve and quadratic data compression methods 
describe below, but not for the the maximum-likelihood 
method, where it requires numerical computation of the 
entire likelihood surface. Finally, the simplest covariance 
matrix one can desire is clearly one that is diagonal, i.e., 
where the errors on the elements of the compressed data 
set are uncorrelated. 

2.5. The wish list 

In summary, the ideal data analysis/data compression 
method would 

1. be lossless, at least for the parameters of interest, 

2. give easy-to-compute and uncorrelated errors, 

3. be computationally feasible in practice, 

4. allow one to account for redshift-space distortions 
and systematic effects. 

The first two items can be summarized by saying that we 
want the method to retain the cosmological information 
content of the original data set, distilled into a set of mu- 
tually exclusive (2) and collectively exhaustive (1) chunks. 
In other words, the information chunks should be inde- 
pendent and together retain all the information from the 
original data set 
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In this section, we review the various methods for power 
spectrum estimation that have been proposed in the lit- 
erature and present various extensions. We start in Sec- 
tion 3.1 by developing a for mal ism that is common to all 
our approaches. In Section |3.2|, we discuss how the data 



might be discretized; that is, different ways of packaging 
the data into a convenient form x. We then discuss various 
methods of power spectrum estimation: traditional meth- 
ods which take the square of the amplitude of the Fourier 
modes (Section |3.3|) , the method of brute force likelihood 
(Section the lin ear Karhunen-Loeve data compres- 

sion method (Section |3.5| ) , an d a new quadratic data com- 
pression method (Section |3.6|) . Our suggested approach to 
power spectrum estimation involves a combination of these 
methods in different regimes, as summarized in Figure 1, 
and described in more detail in Section [?]. Throughout 
Section ^, we ignore redshift-space distortions and other 
systematic effects, and assume that clustering is isotropic. 
We return to this topic in Section [|. 

3.1. Density field, shot noise and window functions 



With the exception of the brute force maximum likeli- 
hood technique, all of the methods described below com- 
pute band power estimates qi that are quadratic functions 
of the density field n, which means that they can be writ- 
ten as 



Ej(r a ,rp) 

n(r Q )n(r /3 ) 



J J n(r) n(r') ^ 

for some real- valued symmetric pair weighting functions Ei 
which are designed to isolate different ranges of wavenum- 
ber k — the methods simply differ in their choices of Ei. 
Taking the expectation value of n(r)n(r')/n(r)n(r') pro- 
duces three terms: "1" from the mean density, S D (r — 
r')/n(r) from shot noise and S r (r)S r (r') from density fluc- 
tuations. By a derivation analogous to FKP and T95, one 
finds that these three terms give 



(q i ) = W i (0)+b i + / Wi(k)P(fc) 



d 3 k 
(2^)3 



(5) 



where Wi, the three-dimensional window functions, are 
given by 

Wi(k) = Si(k,k) (6) 



and 



£i(k,k')= / £i(r,r' 



ik'.: 



d 3 rd 3 



(7) 



is a Fourier transform of Ei. We will often find it 
convenient to rewrite the last term of equation (^) as 
J °° Wi(k)P{k)dk, where the one- dimensional window 
function is the angular average 



Wi(k) = k 2 / Wi{k)dn k 



The shot noise bias is given by 



n(r) 



(8) 



(9) 



Alternatively, 
terms with a 



bt can be made to vanish by omitting the 
= j3 from the double sum in equation (|), 
as described in Appendix A. The term Wi(0) simply 
probes the mean density of the survey, and as described 
in Section |^ and Appendix B, the functions Ei should al- 
ways be chosen such that this term vanishes, i.e., so that 
J Ei(r, r')d 3 rd 3 r' = 0, thereby immunizing the power esti- 
mates to errors in normalization of n (Section 5.1) 4 . The 
desirability of choosing windows with this property was 
first explicitly pointed out by Fisher et al. (1993), and 
this prescription was used also by e.g. Hamilton (1992) 
and Cole, Fisher & Weinberg (1994). We want to inter- 
pret qi in equation (0) as probing a weighted average of 
the power spectrum, with the window function giving the 
weights, so Ei should be normalized so that Wi(k) inte- 
grates to unity (throughout this paper, we will write the 



volume element in Fourier space as d 3 k/(2ir) 3 rather than 

,? Another problem with large N is the memory requirements for an N X N covariance matrix; if N = 10 5 , for example, this is currently 
beyond the RAM capacity of all but the very largest computers. 

4 In fact, the stronger constraint J Ei(r, r')d?r' = for all r should be enforced, as discussed in Appendix B. 
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d k, since this minimizes the number of 27r-factors else- 
where). Using equation @ and Parseval's theorem, this 
gives 



^r ,, '' <( ' ) (w = / >, '' ,k, (S f= / E ' (t ' t) ' ,3r ' 

(10) 

so if we think of Ei(r,r') as a matrix with indices r and 
r', this normalization condition is simply tr [£",] = 1. 

As described in Hamilton (1997a), the window function 
has a simple geometrical interpretation. Let us rewrite 
equation (g) as 



Wi(k) = Airk 2 / w t (r)j Q (kr)dr, (11) 

JQ 

where the separation weighting is defined by 

Wi(d)= f f E l (r 1 r')S D (\r-r'\-d)d 3 rd 3 r'. (12) 



The separation weighting is the average of Ei over all pairs 
of points separated by a fixed distance d, weighted by the 
number of such pairs. Equation ([ll]) shows that the only 
aspect of Ei that affects the window function is the sepa- 
ration weighting, since two different Ei that give the same 
u>i will produce identical window functions. 

An important special case, to which we return in the 
next subsection, is that where Ei is of rank one, i.e., of 
the separable form 



Ei(r, 



^(r)^(r')* 



(13) 



for some ?pi. In this case, equation (0) can be written as 



|xi| 2 , where Xi = J ipi(r)n(r) / n(r) (see equation (15) 



below), and the window function becomes simply 
Wi(k) = |^(k)| 5 



(14) 



(Here and throughout, hats denote Fourier transforms; 
?/>(k) = J e -lk ' r -0(r)ii 3 r — see Appendix D. ) W e will see 
that both the tradit iona l methods (Section |3.3| ) and the 
KL method (Section |3.5| ) are of this separable form, while 
the quadratic data compression method that we present 
in Section is not. We will occasionally refer to the 
former methods as linear data compression, since they be- 
gin by taking linear combinations of the data with weight 
functions ipi(r); cf., equation (|l5|). The latter method, on 
the other hand, is intrinsically quadratic since qi is not 
merely the square of some quantity that is linear in the 
data. This is because its pair weighting E is optimized 
to provide a minimum variance power estimator, which 
makes E a quadratic form of rank greater than one. 

To avoid confusion, the reader should bear in mind that 
when we distinguish between linear and quadratic meth- 
ods below, we are referring to linear versus quadratic data 
compression. The power estimator qi is of course quadratic 
in both cases. 

3.2. Pixelization 

Hamilton (1997a) has recently derived the functions Ei 
that provide the minimum-variance power spectrum esti- 
mates for an arbitrary selection function and survey ge- 
ometry. Unfortunately, this optimal weighting scheme is 



in general impractical to implement numerically, as it in- 
volves a computationally cumbersome infinite series ex- 
pansion — only in the small-scale limi t do es it become 
simple, as will be described in Section 4.3. To proceed 
numerically, it is therefore convenient to discretize the 
problem. This reduces it to one similar to that of cosmic 
microwave background (CMB) experiments: estimating a 
power spectrum given noisy fluctuation measurements in a 
number of discrete "pixels" . Once the pixelization is done, 
the remaining steps are quite analogous to the CMB case 
(T97), and involve mere matrix operations such as inver- 
sion and diagonalization. These operations can often be 
further simplified by a more suitable choice of pixelization. 
Let us define the overdensity in N "pixels" x\, xn by 



n(r) 
n(r) 



ipi{r)d 3 r 



(15) 



for some set of functions r/'t- We discuss specific choices 
of tfji m some detail below. One generally strives either 
to make these functions fairly localized in real space (in 
which case the pixelization is a generalized form of counts 
in cells) or fairly localized in Fourier space (in which case 
we will refer to the functions ipi as "modes" and to Xi as 
expansion coefficients). The reader should be warned that 
the latter case is far from what would normally be thought 
of as "pixels" , and that we will be using this terminology 
nonetheless to stress that all discretization schemes can be 



uivalent way. 

simply subtracts off the 



treated in a mathematically eqi 
The "—1" term in equation v 
mean density. As we discuss in Section ^| and Appendix 
B, one can and should choose the weight functions ipi to 
have zero mean, making this term irrelevant (note indeed 
our use of this fact immediately preceding equation (|l4"|)). 
This corresponds to requiring that 



ipi(r)d 3 r = 0. 



(16) 



Let us group the pixels Xi into an ./V-dimensional vector 
x. From equations (|l5| ) and ([Hj]) and a generalization of 
the derivation of equation (||), T95 shows that 



W 

vt\ 



0. 

c 



N 



where the shot noise covariance matrix is given by 



N; 



^(r)^(r)* 
n{r) 



d 3 r 



and the signal covariance matrix is 



; d 3 k 
S tj = I ^(k)V i (k)*P(fc)— -3. 



(17) 
(18) 



(19) 



(20) 



How should we choose our ipi's to pixelize space? For a 
pixelization to be useful, we clearly want the data set x to 
retain as large a fraction as possible of the cosmological in- 
formation from the original data set (the galaxy positions), 
while simultaneously simplifying subsequent calculations. 
We here list several natural options, most of which have 
appeared in the literature. 
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3.2.1. Counts in cells 

Here one partitions the survey volume into N mutually 
exclusive and collectively exhaustive volumes Vi, and de- 
fines ijji(r) = n(r) if r € Vi, ipi(r) = elsewhere. Thus Xi 
is simply the number of galaxies observed in Vi , minus the 
expected average. A set of useful approximations for com- 
puting N and S for this case is derived in VS96. To keep 
the number of cells from becoming intractably large, one 
might choose the cells to be larger in distant and poorly 
sampled regions of space than nearby. 

With these sharp-edged cells, any linear combinations of 
pixels will correspond to a weight function that is discon- 
tinuous at cell boundaries. To avoid power leakage prob- 
lems that this can in principle cause, one might use cells 
with "fuzzy" boundaries instead, for instance Gaussians 
as described in T95. 

One can greatly simplify the computation of the covari- 
ance matrix by choosing all cells to have the same shape 
and to be spherically symmetric (e.g., spheres or Gaus- 
sians), since the resulting Sy will depend only on the sep- 
aration of the pixel centers and this correlation function 
can be pre-computed and splined once and for all. For 
the volume-limited case (n constant), performing the ap- 
propriate integrals for identical spherical cells of radius R, 
separated by uR, gives 



N,; 



(2 — «) (4+u) -y 
16 V 





if u< 2, 
if u > 2, 



(21) 



where V = 4nR 3 /3. 



3.2.2. Fourier modes 

All of the above-mentioned pixels were fairly well- 
localized in real space. To make pixels reasonably localized 
in Fourier space, one can choose modes "0i that are plane 
waves, tapered by some weight function 4> to make them 
square integrable: 



^(r) = <p{vy v 



(22) 



for some grid points in Fourier space. Choosing all 
modes that are periodic in a box containing the survey 
volume will ensure that this is a complete set, although 
some high-frequency cutoff is of course necessary to keep 
the number of pixels finite. Four different choices of the 
volume weighting function cf> have appeared in the litera- 
ture: 



1 inside survey volume 
outside survey volume 
r) oc n(r) 

n(r) 



4>(r) tx 



1 + n(r)P 
r) oc eigenfunction of 



V 2 - 
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n(r) 



(23) 
(24) 
(25) 

(26) 



All are to be normalized so that the corresponding win- 
dow functions integrate to unity - J <fi(r)d r = 1 as shown 
below. Note that without careful choice of the 's, equa- 
tion (|l6|) will not be satisfied in general for these pixeliza- 
tions. The first choice, which weights all volume elements 
in the survey equally, was employed by e.g. Vogeley et al. 



(1992), Fisher et al. (1993), and Park et al. (1994). The 
second choice is used when determining the angular power 
spectrum of a sample without redshifts, e.g., the APM sur- 
vey (Baugh & Efstathiou 1994). Then all galaxies by de- 
fault receive equal weight (moreover, modes can of course 
only be computed in the directions perpendicular to the 
line of sight). The third choice, advocated by Feldman, 
Kaiser & Peacock (1994, hereafter FKP), minimizes the 
variance in the limit when fc~ x <C t he dep th of the survey 
and is discussed in detail in Section 4.3.1. Here P denotes 



an a priori guess of the power in the band under consider- 
ation. The fourth choice (Tegmark 1995, hereafter T95), 
gives the narrowest window function for a given variance, 
where the constant 7 determines the tradeoff. 

3.2.3. Spherical harmonic modes 
The spherical wave choice 



ipi(r) = Y £m (r)j e (k n r) 



(27) 



where Yg rn is a spherical harmonic and je a spherical Bessel 
function, is well-suited for full-sky redshift surveys, and 
has the advantage (Fisher, Scharf, & Lahav 1994; Heav- 
ens & Taylor 1995) of greatly simplifying inclusion of the 
effect of redshift-space distortions in the analysis. 

3.2.4. Guessed eigenmodes 



In Section 3.5.1, we describe a set of smooth functions 



known as continuum signal-to-noise eigenmodes. These 
modes have a number of useful properties. In particu- 
lar, there is an integer N such that the first N eigen- 
modes retain virtually all the cosmological information 
in the survey. If one has a reasonable guess as to the 
shape of these functions a priori, before computing them 
exactly (they depend on the survey geometry), the first N 
of these guessed modes are obviously a good choice for the 
pixelization functions ipi, since the amount of information 
destroyed by the pixelization process will then be small. 

3.3. Traditional methods 
The traditional approach has been to estimate the power 



by simply squaring the pixels (qi 



choosing the 



pixels to be Fourier modes as described above. This corre- 
sponds to the pair weighting of equation (|l3|), where ibj is 
give n by equation ( ^2|) and 4> is specified by equation (P3|), 
(§J), Hi or (H). Equation (|) shows that 



(o) 5 



1 



n(r) 



(2tt) 3 



|0(k-k,)| 2 P(k) 



d 3 k 
(2?rp 

the last term of which is simply the true power spec- 
trum convolved with the function |</>(k)| 2 . The 3D win- 
dow function is thus W*i(k) = \<p(k — ki)| 2 - Using this 
and Parseval's theorem (or equation ( |l0| ) directly), one 
finds that the window function normalization constraint 
J Wi(k)d 3 k / (2ir) 3 = 1 corresponds to simply 



(r) 



2 d 3 r 



1. 



(29) 



These simple expressions have frequently been used in the 
literature, but an annoying complication has often been 
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neglected. As described in Section [|, the fact that the 
normalization of n is not known a priori, but is deter- 
mined from the observed galaxies (the so-called integral 
constraint problem) can be eliminated by choosing weight 
functions ipi that are orthogonal to the monopole, i.e., 
such that ipi(0) — 0. Although this is easy to arrange with 
the pixelized methods described below, the choice of equa- 
tion (^2|) does generally not have this important property. 
To obtain a correct answer with the traditional methods, 
this must be corrected for. As shown in Appendix B, the 
power estimator 



Pi 



A, 



(30) 



is unbiased and incorporates the integral constraint cor- 
rection (when n is normalized so that Xi = for = 0) if 
the normalization factor Ai and the shot noise correction 
bi are given by 



A, 




where the functions a and b arc defined by 



o(k) 
6(k) 



(r)V kr d 3 r 



T) e lkr d 3 r. 



n(r) 



(33) 
(34) 



If the survey is volume limited, then n is independent of 
r, 6(k) = a(k)/n, and bi = Ai/n. 

After computing power estimates q% at a large grid of 
points kj, one finally takes some weighted averages of the 
qi to obtain power estimates in some bands of (scalar) k 
(= |k|). The problem of finding the weights that min- 
imize the variance of the band power estimators is un- 
fortunately quite a difficult one, and in general involves 
solving a numerically unpleasant quadratic programming 
problem (T95). For this reason, the customary approach 
has been to simply give all qi equal weights in some spher- 
ical shells in k-space although, as described in VS96, this 
is in general far from optimal. 

3.4. Brute force likelihood method 

Let us parameterize the power spec trum P(k) by some 
parameter vector as in Section 2.2. In the approxima- 
tion that the probability distribution for the pixel vector 
x is a multivariate Gaussian, it is given by equation ^ 
with mean m = 0. The maximum likelihood estimator of 
©, denoted ® m h is simply that ©-vector that maximizes 
the likelihood /(x;0). This maximization problem can 
unfortunately not be solved analytically when the number 
of pixels exceeds one, so & m i is a complicated non-linear 
function of x that must be computed by solving the max- 
imization problem numerically. Since one generally wants 
error bars on the estimate as well, one typically evaluates 



the likelihood function at a dense grid of points in pa- 
rameter space and rescales it to integrate to unity. These 
final results are often illustrated in contour plots, as at the 
bottom of Figure 1. 

3.5. Linear (Karhunen-Loeve) method 

As will be discussed in Section ^, the traditional meth- 
ods generally destroy information, while the brute force 
method is lossless but computationally impractical. The 
Karhunen-Loeve (KL) method (Karhunen 1947) maintains 
the advantage of the brute-force method (indeed, it can 
produce an essentially identical answer faster), and has 
additional useful features as well, as we detail below. It 
was first introduced into large-scale structure analysis by 
Vogeley & Szalay (see VS96), and it has also been suc- 
cessfully applied to Cosmic Microwave Background data 
(e.g. Bunn 1995; Bond 1995; TTH; Bunn & White 1997; 
Jaffe, Knox, & Bond 1997). Like any method designed to 
minimize error bars, the KL-technique requires an a priori 
assumption for the power spectrum. This is referred to as 
the fidu cial P (k). As will be described in sections |3.5.4, 



4.2 and 6.3.5, this is not a problem in practice, since a 
bad fiducial model does not bias the result. Moreover, the 
measurement itself can be used as the fiducial model in an 
iterative procedure if desired. 

We start by defining signal-to-noise eigenmodes in 
before generalizing the technique in Sec- 



Section 3.5.1 



tion 3.5.2 



3.5.1. Signal-to-noise eigenmodes 

The signal-to-noise eigenmodc method consists of defin- 
ing a new data vector 



y = Bx, 



(35) 



where b, the rows of the matrix B, are the N eigenvectors 
of the generalized eigenvalue problem 



Sb = ANb, 



(36) 



sorted from highest to lowest eigenvalue A and normalized 
so that b^Nb = I. This implies that 



(viVj) = M 1 + 



(37) 



which means that the transformed data values y have the 
desirable property of being statistically orthogonal, i.e., 
uncorrelated. In the approximation that the distribution 
function of x is a multivariate Gaussian, this also implies 
that they are statistically independent — then y is merely 
a vector of independent Gaussian random variables. More- 
over, since the eigenmodes diagonalize both S and N si- 
multaneously, equation (|3^) shows that the eigenvalues Xi 
can be interpreted as a signal-to-noise ratio S/N. Since 
equation (^) shows that the quantity yf — 1 on average 
equals this signal-to-noise ratio, it is a useful band power 
estimator when normalized so that its window function in- 
tegrates to unity. The window function is given by equa- 
tion (M) with tpi replaced by the continuous KL mode 

N 

^(r) = 2B«^(r), (38) 
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since = f[n(r)/n(r) — l]il)' i (Y)d 3 'r . For the volume- 
limited case, the noise power is simply 1/n, so the correctly 
normalized and bias-corrected KL band power estimators 
are simply "signal = noise x signal-to-noise", i.e., 



V, 



1 



(39) 



compare this with equation (p0[). Since the matrix B is 
invertible, the final data set y clearly retains all the in- 
formation that was present in x. In summary, the KL 
transformation partitions the information content of the 
original data set x into N chunks that are 

1. mutually exclusive (independent), 

2. collectively exhaustive (jointly retaining all the in- 
formation), and 

3. sorted from best to worst in terms of their informa- 
tion content. 

Typically, most of the KL coefficients j/j have a signal-to- 
noise ratio A <C 1, so that the bulk of the cosmological in- 
formation is retained in the first N' coefficients, N' <C N, 
which is why the KL method is often referred to and used 
as data compression. One can thus throw away all but the 
first N' numbers yt without any appreciable information 
loss, and this compressed data set will still satisfy proper- 
ties 1 and 3 exactly, and 2 to a good approximation. 

Bunn (1995) and VS96 have pointed out that the S/N- 
coefficients y are useful for power spectrum estimation 
since as long as the galaxy survey probes only scales 
smaller than the peak i n the power spectrum (cf., the dis- 
cussion in Section 3.3.3| ), the first N' window functions Wi 
have the following two properties: 



h^Ch = 1 instead of b f Nb = 1, since this is more con- 
venient throughout the rest of the paper. The matrix ele- 
ments are thus a factor (1 + A^) 1 / 2 smaller than in the 
previous subsection. As shown in TTH, solving the eigen- 
value equation ( |40| ) is useful for any parameter Oi on which 
C depends, even those which do not affect only the power 
spectrum (e.g., redshift-space distortions, cf., Appendix 
C), and the signal-to-noise eigenmode method discussed 
above is just the special case where Oi is the power nor- 
malization. The three properties listed above continue to 
hold in the general case, and the "information content" 
in item 3 above now refers to the information about the 
parameter Oi. 

The KL method is a very general data analysis tool. 
Note that the eigenmodes continue to be mutually exclu- 
sive and collectively exhaustive if we replace C,j by any 
symmetric matrix M in equation (|40|). To ensure that 
the KL modes give narrow window functions ranging from 
small to large k, one can therefore choose M to be the 
signal covariance matrix S that would arise from some 
monotonically decreasing fiducial power spectrum, for in- 
stance P(k) cx fc~ 3 . In that case, the modes become sorted 
by the ratio of "signal" in the fiducial power spectrum to 
noise; as the former is monotonic, the modes a re sort ed by 
scale. We discuss this point further in Section |6.3.3 . 



3.5.3. KL modes are asymptotically 
pixelization-independent 

The pixelizations listed in Section 3.2 were all in some 
sense arbitrary, and generally somewhat redundant. The 
KL modes can eliminate this arbitrariness, as mentioned 
in Section 



3.2.4 



If the pixelization functions ipi formed a 
complete set, spanning the space of all square-integrable 
functions over the survey volume, the continuum KL 
modes defined by equation d3cf ) would be some smooth 
functions %l>\ (r) that were independent of the pixelization 
used to compute them and depended only on P(k), n(r) 
and the geometry of the survey. In practice, the functions 
ipi do not of course form a complete set, since one is lim- 
ited to a finite number of pixels, but the continuum KL 
modes ^(r) can nonetheless be computed numerically. By 
choosing a pixelization that can resolve all features down 
to some scale R, equation ([38|) will accurately approximate 
all continuum modes whose window functions Wi(k) probe 
only scales k^ 1 3> R. If the number of pixels are increased 
further to probe smaller scales, the first N' modes remain 
stable against this perturbation: the first N' eigenvalues 
and eigenvectors would only change by a small amount 
(Szalay & Vogeley 1997). The new modes, on the other 
hand, would represent the small scale noise probed by the 
new pixel scale. By using the functions ip'i{ r ) to pixelize 
the data, i — 1, N, one can thus make sure that all of 
the cosmological signal down to the scale probed by tp' N is 
retained. 

3.5.4. Using KL modes for trouble spotting 

If the assumed power spectrum model is correct, the KL 
coefficients yi defined by equation ( |35| ) will be independent 
Gaussian random variables with zero mean and unit vari- 
ance. This offers an efficient way of testing whether the 
data are inconsistent with this model. The detection of, 

5 To prevent power spectrum plots from becoming too cluttered with points with large error bars, it is convenient to combine neighboring 
band power estimates with a weighted average — these broader band powers will of course still be uncorrelated since all the qi are. 



1. They are narrow in fc-space, 

2. As i increases from 1 to N', they probe all the scales 
accurately measured by the survey, from largest to 
smallest. 

Since they are also uncorrelated, these power spectrum 
estimators therefore have all desirable properties that one 
may wish for: they distill the cosmological information 
content of the data set into a set of mutually exclusive 
and collectively exhaustive chunks, which correspond to 
the band powers in a set of narrow bands. In the approxi- 
mation that y has a Gaussian distribution, the probability 
distributions of the power estimates qi are simple: they are 
independent % 2 distributions with one degree of freedom. 5 

3.5.2. General KL modes 

Let us write C = 0jS + N, where the power spectrum 
normalization parameter 9i — 1 in the fiducial model. 
Since S = C,i= dC/d6i, equation ( p6| ) can be rewritten 
as (TTH) 

C, 2 b = A'Cb, (40) 

where A' = A/(l + A) or, equivalently, A = A'/(l — A'). 
From now on, we will normalize the eigenvectors so that 
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say, a 6 — a outlier {\tji\ > 6) would provide strong evidence 
that there is either more variance on the scale probed by 
the i th mode than the fiducial power spectrum assumed, or 
that the probability distribution is strongly non-Gaussian 
on that scale. Even if it goes undetected, an incorrect 
assumed model does not bias the esti mate o f the power 



fej+i, where 



spectrum, as discussed below in Section 6.3.5 



3.5.5. Using KL modes for linear filtering 

The KL eigenmodes have an additional use. The process 
of throwing away the eigenmodes with low signal-to-noise 
ratios splits the space of all possible density fields given the 
data into two — one subspace that mostly contains noise, 
and one that is dominated by our generalized signal. The 
two are statistically orthogonal to one another. The ex- 
pansion of a given dataset over the signal subspace will 
substantially reduce the noise, thus representing a useful 
linear filtering of the data (cf., Seljak 1997). Since it max- 
imizes the signal for a given number of included modes, 
the KL transform is sometimes referred to as "optimal 
subspace filtering" . More generally, let us define a filtered 
data set 

JV 

y,iCK hj ir,g, (41) 

3=1 

for some weights Wj. The generalized eigenvector orthog- 
onality relation gives BOB* = I (cf., VS96, TTH), which 
implies that C _1 = B^B and CB^B = I. Hence equa- 
tion d35| ) gives CB^y = x, which implies that we recover 
the original data set (x' = x) in equation ( [i"l| ) if we choose 
the weights Wj = 1. Another simple example is the optimal 
subspace filtering mentioned above, which corresponds to 
the choice Wj = 1 for j < N' , wj = otherwise. Finally, 
it is easy to show that the choice Wj = Xj gives Wiener 
filtering, which is defined by x' = SC _1 x. In other words, 
Wiener filtering becomes diagonal in the KL basis, since 
it diagonalizes S and C simultaneously. Indeed, Wiener 
filtering is but one of many linear filters that are straight- 
forward to implement in the KL basis. 

3.6. Quadratic method 

In both traditional methods and the pixelized KL tech- 
nique, the power spectrum estimates qi are some quadratic 
functions of the observed density field n(r), i.e., of the 
form given by equation (|J). Hamilton (1997a) adopted 
a more ambitious approach, and computed the unbiased 
quadratic power estimators that have minimal variance, 
using a series expansion. T97 subsequently showed that 
these estimators are unbeatable; their Fisher informa- 
tion matrix is identical to that of the raw data, so no 
non-quadratic unbiased estimators can give smaller vari- 
ance. Moreover, they can be computed without recourse 
to the numerically cumbersome series expansion of Hamil- 
ton (1997a) (cf. T97 and Knox, Bond, & Jaffe 1997 for 
applications to CMB observations). Here we show how 
this method can be applied to galaxy surveys, and its re- 
lation to the KL method. Just as for the KL technique, a 
fiducial power spectrum is assumed. 

We will parameterize the power spectrum as a piecewise 
constant function with N' "steps" of height pi, which we 
term the band powers. Thus P(k) = pi for ki < k < 



= ki < k-i < ... < k 



N' + l 



CO, 



(42) 



and group them into an iV'-dimensional vector p. For the 
method to be strictly lossless, these bands should be cho- 
sen to be quite narrow compared to the scales on which the 
power spectrum varies. One first computes a compressed 
data vector q whose N' elements are quadratic functions 
of the data set x. These are defined by 



Qi = ^ Z ^.i 2 ' 

where the vector z is given by 

z = C -1 x, 
and the matrix is defined by 



(c,i) a6 



^(k^k)*-— . 

fc»<iki<fe i+1 ( 27r r 



(43) 



(44) 



(45) 



That is, Cj is the derivative of the covariance matrix C 
(equation (JL8)) with respect to the normalization of the 
i th band, inthe limit of narrow bands. Rewriting this as 



where 



Qi = ^ xtE t x ; 



E, = C 1 C,i C , 



(46) 



(47) 



we see that the matrix Ej is simply a discrete version of 
the pair weighting function Ei(r,r') in equation (||). Note 
that it is not separable, i.e., it has rank greater than one. 

For the Gaussian case, the Fisher information matrix 
for p defined by equation (§) reduces to (VS96, TTH) 



hi = gtr [C-^CT 1 ^] , 



(48) 



and T97 shows that both the mean and the covariance of 
q are given in terms of F: 



(qq ( > 



(qXq) 4 



Fp, 
F. 



(49) 
(50) 



This means that F _1 q is an optimal estimator of p, since 
its covariance matrix is precisely the inverse of the Fisher 
matrix. Moreover, as shown in T97, compressing the data 
set x into the coefficients q for some sufficiently narrow 
bands is a strictly lossless procedure, retaining all the in- 
formation about those cosmological parameters that af- 
fect galaxy clustering through the power spectrum alone. 
Equation (|49| ) shows that, in terms of the band powers, 
the window functions W of equation (||) for the coefficients 
qi are simply proportional to the rows of the ubiquitous 
Fisher matrix. 

The coefficients F _1 q tend to be both correlated and 
noisy, and therefore it is better to work with the trans- 
formed coefficients defined by 



p^F-^c 



(51) 
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renormalized so that their window functions integrate to 
unity. Here F 1 / 2 denotes the symmetric matrix whose 
square is F - it is readily computed by diagonalization. As 
shown by Tegmark & Hamilton (1997), these coefficients 
are all uncorrelated (multiply equation (^(j|) on both the 
right and the left by F -1 / 2 to see this), and moreover 
tend to be very well-behaved numerically, with narrow 
non-negative window functions (the rows of F 1 / 2 ) span- 
ning the entire range of scales probed by the survey. 

4. RELATIONS BETWEEN THE METHODS 
4.1. Relation between KL and quadratic method 

As we will now show, the linear and quadratic pixelized 
methods are closely related — the latter is simply a faster 
way of computing the same band power estimates. 

Let us optimize our KL modes to estimate not the over- 
all power normalizati on, bu t the band power in band i. As 



we showed in Section 3.5.5 



B f B. 



Introducing the diagonal matrix A 
tion ( fto| ) implies that 

BC iB f = A. 



(52) 

diag{A^}, equa- 
(53) 

We are partially suppressing the index i here for simplic- 
ity, although the eigenvectors in B and the eigenvalues 
in A of course depend on which power band i we opti- 
mize for. Equations (35), ( |52"|) and ( |5^ ) allow us to rewrite 
equation (J43|) as 



2qi = x + C _1 C iC -1 x = xtB+BC iB+Bx 



= y f Ay = 



3 



(54) 



i.e., as a weighted average of the squared KL coefficients 
yj , with weights given by the KL eigenvalues X'j . Thus the 
coefficients are weighted by their inverse variance, which 
means that this is the minimum- variance band power esti- 
mator based on the squared KL coefficients, so after sub- 
tracting off shot noise and normalizing correctly, the linear 
(KL) and quadratic methods give exactly the same esti- 
mator for the band power. As described in Section 6.4, 
the quadratic estimator is is simply faster to compute. A 
more intuitive way of understanding why the two methods 
give the same result is to note that the quadratic method 
was derived explicitly to be the minimum variance estima- 
tor (T97; cf., equation @), and that VS96 showed that 
the KL approach is guaranteed to minimize the variance 
as well. 

4.2. Relation between quadratic and ML methods: 
iteration 

If the quadratic method is repeated using the output 
(measured) power spectrum as the input (fiducial) power 
spectrum, then this iteration will eventually converge to 
the power spectrum that would be obtained with the brute 
force maximum-likelihood method (Bond, Jaffe, & Knox 



1997). This is because the quadratic method can be de- 
rived by expanding the logarithmic likelihood In / to sec- 
ond order around the fiducial point and maximizing it. 
The iteration thus seeks the maximum of In / by repeat- 
edly approximating it with a parabola. This is essentially 
the maximum-gradient method for maximization, which 
is known to have excellent convergence properties. In the 
one-dimensional case, this is equivalent to finding the zero 
of (In/)' by repeated linear approximations. This is sim- 
ply the Newton-Raphson root-finding method, known to 
converge exponentially fast and asymptotically double the 
number of correct decimals in each iteration. 

Although the ML-result can be computed by iterating 
the quadratic method, it should be stressed that this is 
not necessarily a good thing to do. If the measured power 
spectrum is noisy, so that the band power uncertainties do 
not satisfy AP <C P, then iteration can produce mislead- 
ingly small error bars, as the following example illustrates. 
Suppose sample variance or a shot noise fluctuation makes 
a band power measurement ten times smaller than the true 
value and this is used as the new fiducial band power P 
(prior). Then equation ( |62| ) shows that the nominal AP 
from sample variance will be ten times too small as well. 
If iteration is nonetheless desired, it is crucial not to let 
the prior over-fit the data. A better approach than simple 
iteration is therefore to use as prior a fit to the measured 
P(k) with a smooth parameterized fitting function, and 
keep adding fitting parameters until the the value of x 2 
per degree of freedom drops below unity (Tegmark 1997a; 
Seljak 1997). 

4.3. Relation between quadratic and FKP methods: the 
small scale limit 

For a traditional method with a volume weighting func- 
tion <j)(r), we define a quantity L implicitly by 



kh<\ 



, d 3 k 
(2^)3 



1 

2' 



(55) 



i.e., L" 1 is the radius of a sphere in Fourier space contain- 
ing half of the k = window function. For simple volumes 
such as a pencil beams, slices or spheres, L is of the order 
of the width of the survey in its narrowest direction. We 
will now show that in the small-scale limit where fc" 1 -C L, 
the quadratic method reduces to the FKP method, which 
implies that the latter is lossless and hence unbeatable for 
measuring the power spectrum on the smallest scales, as 
was pointed out by Hamilton (1997a). 

4.3.1. Derivation of the FKP method 
The FKP method (Feldman, Kaiser & Peacock 1994) is 



the traditional method described in Section 3.3, using the 
volume weighting of equation (p5|). The rank one power es- 
timates | a; 2 1 are averaged with equal weights 6 for all modes 
in a spherical shell |k| = k a , so integrating equation (13) 
over this shell using equation ( p2| ) for the pixels, we see 
that this corresponds to the pair weighting 



^(r,r') = ^(r)^(r')io(fci 



(56) 



6 An intuitive way to understand why all directions in fc-space should receive equal weight when Afc 3> L~ is to note that in this limit, the 
number of coherence volumes that fit into a given solid angle in the shell is independent of its shape (and hence independent of direction in 
fc-space), being merely the ratio of the shell subvolume to the coherence volume. 
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where jo(x) = sin(a;)/a;. This holds for any choice of vol- 
ume weightin g f unction (p. The specific FKP choice given 
by equation ( |25| ) is derived by minimizing the variance of 
the corresponding power estimators ft. This involves a 
number of approximations. We summarize the derivation 
here, to clarify why the FKP weighting is optimal only on 
small scales. 

Substituting equations ( fl5| ) and ( p2[ ) into equations (19) 
and (EG), we obtain the pixel covariance matrix 



+ 



(k-k a )0(k-k o )*P(fc) 



d 3 k 



(r) 2 



n(r) 



(57) 



The function 0(k) roughly falls off on a scale L" 1 . As 
long as L^ 1 <C k and the power spectrum P(k) is a smooth 
function, P(k) will therefore be almost constant where the 
first integrand is non-negligible (i.e., where |k — k | < L^ 1 
and |k — k(,| < ^ _1 ) and can be approximately factored 
out of the fc-integral. Using the convolution theorem, this 
yields 



= i(k a -k(,)- 



(r) 2 



P 



1 



n(r) 



(58) 



where P = P([fc a +fc b ]/2). This shows that C ab w if |k Q - 
k(,| 3> , so power estimates separated by much more 
than Ak — L _1 are essentially uncorrelated. Conversely, 
power estimates from nearby shells with |k a — k&| <C L^ 1 
are almost perfectly correlated and therefore redundant. 
The FKP method therefore averages the power estimates 
|a; Q | 2 in thicker shells |k a | G as in equation (|42"|), 

whose widths satisfy L^ 1 <C \k i+ i — k%\ <C k, giving ap- 
proximately uncorrelated power estimates ft. This approx- 
imation clearly only holds for small scales, k^ 1 <C L. We 
denote the volume of the i th shell 



4 

V s = -7T 

3 



(59) 



Assuming that the pixelized data x is Gaussian dis- 
tributed, the variance of the power estimator qi is simply 
given by averaging 2|C ao | 2 over the shell, i.e., 
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(60) 



where both integrals are to be taken over the i shell. 
Substituting equation (pq), and using the fact that L _1 -C 
|k a — k(,| <C fc, one of these integrals simply produces a 
factor T4. Applying Parseval's theorem to the result and 
using equation (p9f) to normalize 7 finally leaves us with the 
approximation 
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n(r)P 



(61) 



This holds for any weighting function <j>. The FKP choice 
of <fi is derived by minimizing this approximate expression 
for the variance. Since it is left unchanged if we multiply 
by a constant, we can for simplicity impose the normal- 
ization constraint f 4> 2 d 3 r = 1. Minimizing the numera- 
tor of equation (ph) with a Lagrange multiplier for this 
constraint now gives the FKP weigh ting of equation (p5|). 
Substituting this back into equation ( |6l| ) and omitting the 
power band index i for simplicity finally yields 



where 



V eff (k) = 



V 8 V e ff ' 



n(r)P(k) 
1 + n(r)P(fc) 



(62) 



(63) 



can be interpreted as the effective volume probed, since the 
integrand is of order unity where one is signal dominated 
(P ^> 1/n) and ~ where one is noise dominated. For 
a volume-limited survey with spatial volume V and con- 
stant n, the FKP prescription weights all galaxies equally, 
and we simply have V e ^ = [1 + l/nP] _2 y. An intuitive 
way to understand equation (p2) is to note that the nearby 
Fourier amplitudes Xi are correlated over a coherence vol- 
ume V c ~ 1/V. Thus as long as shot noise is not dominant, 
AP/ P ~ a/2/A/ , where Af is the number of approximately 
uncorrelated volumes V c that fit into the shell volume V s . 
In conclusion, we have shown that for traditional methods, 
the FKP volume weighting of equation ( p5| ) is optimal if 
and only if we limit ourselves to small scales, <C L. 

4.3.2. The small-scale limit of the quadratic method 

It is often convenient to work in the continuum limit 
where a discrete index i is replaced by a continuous vari- 
able such as r or k. Vectors ai and matrices Aij then 
correspond to functions of one and two variables, such as 
a(r) and A(r, r'). Since all sums get replaced by integrals 
in this limit, units frequently differ from the discrete case 

8 

Let us take the pixelization functions to be Dirac delta 
functions ipi(r) = 5 D (r—ri), corresponding to a continuum 
of pixels 

/ \ n(r) 
*(r) = ^ - 1, (64) 

and choose our parameters to be the values of the 3D power 
spectrum: 

0;=P(k;). (65) 
The matrices C and C,i then reduce to 



C(r,r') 



-ik- (r- 



)p(k) 



d 3 k S D {r-r') 



(27T) 



n(r) 



C )4 (r,r') = 



dC(r,r') _ 1 
dP(k 2 ) ~T2ny- 



zk.i ■ (r- 



(66) 



(67) 



V s [J^d 3 rY 

7 Since kL 2> 1, equation does not need the integral constraint correction 

8 Since b = Ax means bi = y"\ ^-ij x j m the discrete case and b(r) = J A(r, r')x(r')d 3 r' in the continuous case, we see that matrix multi- 
plication introduces units of the volume element, in this example d?r. Thus the continuous analog of the identity matrix, I(r, r') = <5 D (r — r'), 
is its own inverse, since (I 2 )(r, r') = J I(r, r")I(r" , r')d 3 r" = I(r, r'), even though it is not dimensionless — the delta function 5 D has units of 
inverse volume. 
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since 



P(k) 
dP( k)/dP (k. 



/P(ki)5 D (k - 
(2 7 r) 3 5 I? (k - k; 



ki)d 3 ki implies that 
). As we saw in Sec- 



tion |4.3. lj , the small-scale limit corresponds to neglect- 
ing the fc-dependence of P, which gives C(r, r') ~ [P + 
l/n(r)]8 D (r-r') and 



C-Him-') 



P 



n(r) 



^(r-r') = ^(r)<P(r-r'), 

(68) 

where (/> is the FKP weighting function of equation (25) 
normalized so that (j> = n / ( 1 + hP). 

We will now rederive the FKP results with much less 
effort than in the previous section, by simply using the 
quadratic method formulas. Substituting equations (67) 
and (68) into equation (^3|) shows that the quadratic esti- 
mators q are given by 



(69) 



where the function z(r) = <fi(r)x(r), which apart from the 
factor of 1/2 are exactly the FKP estimators of -P(k). Cal- 
culating the Fisher matrix by substituting equations (67) 
and (^8[) into equation ([l8]) gives 



1 



i J (l)(r)e- l ^- {r ~ r ' ] (t)(r')e~ lk3 ' ir '~ r) d 3 rd 3 r' 
il^-k,)! 2 . (70) 



Equation (H9) now gives 



d 3 kj 



(71) 



which shows that the Fisher matrix is simply the 3D win- 
dow function. To make the window function of the power 
estimate qi integrate to unity, we need to divide it by the 
quantity 



d 3 ki 
13 (2tt) 3 " 

where we have used equation 
equation (j25| ) and equation 
equation (pPf), this shows that 



~2P T ' 



(72) 



roj) , Parseval's theorem, 
in the last step. Using 
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where the last equal sign only holds for the volume lim- 
ited case where n is constant. Alternatively, averaging the 
power estimates qi over a shell in k-space as in the previous 
section, equation (|5^) reproduces the FKP error formula 
of equation ([32]). In summary, we have shown that the 
FKP method becomes lossless for measuring the power on 
small scales, since it equals the optimal quadratic method 
in this limit. 

5. SYSTEMATIC PROBLEMS — THE SELECTION 
FUNCTION AND EXTINCTION 

We have discuss ed the importance of the integral con- 
straint in Section 3.3. Here we discuss this issue in the 



context of pixelized methods, and generalize it to a whole 
host of systematic problems, including errors in our as- 
sumed selection function n(r) and our assumed extinction 
map. 

5.1. What is the integral constraint? 

If we knew the selection function n(r) a priori, before 
counting the galaxies in our survey, we would be able to 
measure the power on the scale of the survey. Our power 
spectrum estimate would essentially be the square of the 
ratio of the observed and expected number of galaxies in 
our sample. Of course, we do not know n a priori, so we 
use the galaxies themselves to normalize the selection func- 
tion. Thus the measured density fluctuation automatically 
vanishes on the scale of the survey, and naive application 
of any of the power spectrum estimation methods we have 
described will falsely indicate that P(k) — > as k — > 0, 
regardless of the behavior of the true power spectrum on 
large scales (Peacock & Nicholson 1991). See also Tadros 
& Efstathiou (1996). Equation (p) tells us that apart from 
the noise bias bi, there is an additional term Wi(0) that 
must be subtracted off to make qi an unbiased power spec- 
trum estimator. Let us assume that we know the shape of 
the selection function but not its normalization. To reflect 
this, we write the true selection function as 



n(r) = an (r), 



(74) 



where no is our guessed selection function, and a is an 
unknown normalization constant, and find that our noise 
bias corrected power estimator will have a residual bias 
(a — l) 2 Wi(0). Since we do not know the exact value of 
a, our only way to eliminate this bias is to require that 
Wi(0) = 0, i.e., by requiring the window function to van- 
ish at k = 0. This was first explicitly pointed out by 
Fisher et al. (1993). 

5.2. Its relation to extinction and other systematic 
problems 

Since requiring Wi(0) = eliminates the integral con- 
straint problem, the trouble is confined to the k = mode. 
If Wi(0) — 0, then the power estimate qi is clearly com- 
pletely independent of -P(O), the fluctuations in this un- 
trustworthy mode. The essence of our approach is there- 
fore the following: 

• We have a systematic problem with a certain mode, 
and can immunize our results from this problem 
by making them independent of this untrustworthy 
mode. 

When phrased in this way, it is clear that this approach 
can be applied to a variety of other systematic problems 
as well. For instance, incorrectly modeled extinction adds 
excess power in the form of purely angular modes (den- 
sity fluctuations that have no radial component, i.e., be- 
ing perpendicular to the line of sight). It might therefore 
be desirable to make the results independent of all purely 
angular modes on the relevant scales or, in a less ambi- 
tious approach, at least independent of those modes whose 
shape coincide with known dust templates. Similarly, mis- 
estimating the shape of the radial selection function (of 
no in equation (|^)) pollutes certain purely radial modes 
which one may wish to discard. 
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5.3. How to eliminate untrustworthy modes with 
pixelized methods 

In this section, we show how the power spectrum esti- 
mate from a pixelized method can be made insensitive to 
the type of systematic errors described above. 

Let us parameterize the true selection function n as 



A I 



n(r) =5^o 3 -rij(r), 



(75) 



where fij are known functions and the parameters dj, 
which we group into an Af-dimensional vector a, are a 
priori unknown. Imagine for example a simple case where 
M = 3, fix is our best guess for a purely radial selection 
function based on a Schechter luminosity function, is a 
(purely angular) dust template, and gives the effect of 
an infinitesimal error in the estimate of the characteristic 
luminosity L*: na(r) = dn\(r) / dL*. Let uq denote some 
a priori estimate of n. Defining the "uncorrected" pixels 
as 

~> - f n ( r ) 



n (r) 



we find that 



= Ua, 



(76) 



(77) 



where the N x M matrix U of untrustworthy modes is 
defined by 

u « s / W-) A{r)Sr - (78) 

This means that in general, the data set (x') ^ 0, so the 
uncorrected data set does not satisfy equation (|l7|). 

We show how to solve this problem in Appendix B. In 
short, one replaces x by a cleaned data set IIx, where II 
is a matrix that satisfies IIU = and thus projects out 
the untrustworthy modes. We find that the best choice is 
n = I - U(UtC- 1 U)- 1 UtC- 1 . Although the covariance 
matrix of the cleaned data set is not invertible, we find 
that with this choice of II, the quadratic method remains 
strictly optimal if we simply use C from the uncleaned data 
in equation ( H ) . We also show that the integral constraint 
correction given by equations ([HI) and ( |32| ) corresponds to 
this optimal method in the small scale limit. 

6. PROS AND CONS OF THE METHODS 

Above we have presented all methods for galaxy power 
spectrum estimation that have proposed in the literature, 
as well as the new pixelized quadratic technique and var- 
ious extensions, and showed how they are related to one 
another. We will now discuss their relative merits at some 
length. It will become clear that they are highly com- 
plementary, and we summarize the pros and cons of each 
method in Table 1. We will return to this in the discus- 
sion section, where we describe how the traditional, KL 
and quadratic methods can be used in concert to produce 
a data analysis pipeline having all the properties on our 
wish list from Section 2. 

TRAD LIN QUAD 



Optimal on largest scales 
Optimal on smallest scales 
Simple & uncorrelated errors 
Measures z-space distortions 
Accomodates systematic effects 



7+ +/- 



+ 



Table 1: Pros and cons of the traditional, linear (KL) 
and quadratic power spectrum estimation methods. 



6.1. Pros and cons of the direct Fourier method 

Unlike the pixelized methods, the traditional Fourier 
method uses the exact galaxy positions and thus retains 
all the small-scale information about P(k). 

As discussed in Section 1.3, this method becomes loss- 
less in the limit kL — > oo when the FKP choice of </>, equa- 
tion (|2^), is used. In addition, choosing the shell widths 
Ak 3> L _1 guarantees that the errors in the band power 
estimates Pi will be approximately uncorrelated. This 



means that on small scales, say kL <C 10%, this method 
satisfies the first three criteria on our wish list in Section ^| 
However, these advantages no longer hold when measur- 
ing power on scales comparable to the size of the survey. 
VS96 review problems with the direct Fourier approach 
that occur unless tL « 1. They fall into the two cate- 
gories described below. 

6.1.1. The direct Fourier method destroys information 

Hamilton (1997b) has shown that a strictly optimal di- 
rect summation method can be derived in principle, in 
terms of a series expansion, but this is unfortunately ex- 
tremely burdensome numerically except for scales much 
smaller than the survey size, away from the boundaries, 
where it approximates equations fl2q ) and (Eq). The opti- 
mal galax y p air weighting is not separable (m the sense of 
equation (|l3|)) and thus cannot be expressed in terms of a 
volume weighting function (f> as above. Consequently, no 
direct Fourier methods are lossless except on small scales. 

6.1.2. The method is complicated and computationally 

slow for large scales 

It is important to note that the Fourier transformation 
calculation takes only a negligible amount of time in a 
power spectrum analysis — the lion's share of the work 
involves computing the mean corrections Wj(0), the shot 
noise correction 6j, the normalization factor Ai and the co- 
variance matrix of the power estimates. This becomes nu- 
merically cumbersome on the largest scales, when kL ~ 1. 
This is because both smearing from the window function 
and the effect of the integral constraint become important 
and must be accurately computed in this regime. Even 
though the relevant integrals can be greatly accelerated 
with FFT's, the calculations are not only much more com- 
plicated and obscure than the simple linear algebra of the 
pixelized methods, but generally substantially slower as 
well. The most time-consuming step is the computation 
of the error bars AP and the covariance between power 
estimates using an integral constraint corrected and ap- 
propriately generalized version of equation ( |60| ) , since the 
integral in equation ( p7\ ) must be done separately for each 
pair of grid points (k7 , kj ) in Fourier space (cf. , Gold- 
berg & Strauss 1998). The property of uncorrelated errors 
is clearly lost, making it quite difficult to compute the 
optimal weights for averaging the power estimates in k- 
space into power bands in fc-space (T95, VS96). In short, 
when comparing the direct Fourier method to the KL and 
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quadratic methods on the largest scales, it is more com- 
plicated, uses more CPU time and produces an inferior 
result. 

6.2. Pros and cons of the brute force method 

The pixelized brute force method has been applied 
to galaxy survey analysis using expansions in spherical 
harmonics (Fisher et al. 1994; Heavens & Taylor 1995; 
Ballinger, Heavens & Taylor 1996). 

It is arguably the simplest of all methods, both concep- 
tually and in implementation. Moreover, it can be shown 
to be lossless in the limit of large data sets, thus giving 
minimal error bars. An important disadvantage is that 
it is slow. The slowest step in evaluating the likelihood 
function / is to compute detC, for which the CPU time 
required scales as N 3 , and / is evaluated at a large number 
of grid points in the multi-dimensional parameter space. 
This is why it is so useful if N, the number of pixels, can be 
reduced by a lossless data compression scheme before per- 
forming the likelihood analysis, throwing away noise and 
keeping the signal. As we described above, this is exactly 
what the Karhunen-Loeve and quadratic methods do. In 
the KL scheme, the compressed data set y was a linear 
function of x (of the form y = Bx for some matrix B), 
and in the quadratic scheme q was a quadratic function of 
x, of the form qi = x^E^x/2 for some matrices E,. 

A second disadvantage is that the maximu m li kelihood 
parameter estimate & m i (defined in Section 3.4) is such 
a complicated function of x that we cannot calculate its 
statistical properties analytically. As we saw above, both 
the linear and quadratic schemes allow us to write down 
power spectrum estimators in closed form (equations (39) 
and (43), respectively) This allows one to compute their 
probability distributions exactly, (the yi are Gaussian and 
the qi are are generalized \ 2 distributions, often close to 
Gaussian), which makes these power estimates easy to 
use for parameter fitting further down the data analysis 
pipeline. 

6.3. Pros and cons of the KL method 

6.3.1. It retains the phase information 

An important advantage of the KL method over the oth- 
ers in the table is that the KL coefficients retain the spatial 
information about the data (not only the Fourier ampli- 
tudes, but the corresponding phases as well). This means 
that information on processes which affect the radial and 
angular clustering patterns differently can be optimally 
probed with the KL method. Such effects include 

1. Redshift space distortions (cf., Appendix C), 

2. Galactic extinction (which affects only angular 
modes), 

3. Evolution and mis-estimates of n(r) (which affect 
only radial modes). 

We have not discussed in any detail how one might in- 
clude these effects in a KL analysis. Suffice it to say that 
for any physical effect that affects the covariance matrix 
of the pixels, one can determine KL modes which are op- 
timi zed fo r parameters which describe this effect; cf., Sec- 
tion 3.5.2 . The way to do this for redshift-space distortions 



(see Appendix C) is d escrib ed in more detail in TTH. We 
argue below in Section 6.3.4 that the simple signal-to-noise 
eigenmodes are appropriate general purpose modes for a 
KL analysis of any parameter which affects the covari- 
ance matrix only via the power spectrum. On the other 
hand, the KL modes should be custom tailored (using 
equation ([40|)) for parameters not in this category, such 
as ones causing anisotropic clustering. 

Another approach is that discussed in Section || and Ap- 
pendix B: rather than measuring these systematic effects, 
one can project out those modes that are sensitive to them, 
making the resulting dataset immune from them. This can 
be done for any pixelized method, but is especially simple 
for the quadratic method. 

6.3.2. Pixelization is not lossless 

The "— " on row 2 of Table 1 refers to the fact that com- 
putational constraints place an upper limit on the num- 
ber of pixels used for the eigenvalue problem, since the 
storage required scales as N 2 and the CPU time as N 3 . 
N — 10 4 is readily handled on a high-end 1997 worksta- 
tion (TTH) , and new methods under development (Szalay 
& Vogeley 1997) may well be able to increase this by an 
order of magnitude or more, but since the dynamic range 
is ~ N 1 / 3 cx(CPU time) 1 / 9 , some information will always 
be lost on the very smallest scales. Fortunately, this is not 
a problem in practice, since the complementary traditional 
methods work best precisely on the smallest scales. 

6.3.3. The KL power peak problem 

For very deep surveys such as SDSS and 2dF, which 
probe scales substantially beyond the expected peak in the 
power spectrum at ~ 200/i~ 1 Mpc, the KL window func- 
tions will generally not be narrow but double-peaked, since 
fluctuations longward of the peak have the same signal-to- 
noise ratio as certain fluctuations shortward of the peak 
and will get mixed in the corresponding KL modes. One 
readily circumvents this degeneracy problem by perform- 
ing a likelihood analysis on the KL modes, with the band 
powers being the parameters to be estimated. However, 
the statistical errors on the resulting band power estimates 
will no longer be uncorrelated, and since this is a nonlinear 
operation, they will also not have the simple \ 2 distribu- 
tion in general. This is why Table 1 indicates "+/— " in 
row 4: the "+" applies when we use the direct approach 
on a volume smaller than ~ 200/i _1 Mpc in size, and the 
"— " applies when we use the indirect approach on a deep 
dat a set s uch as SDSS or 2dF. As was described in Sec- 
tion 3.5.2, one can circumvent this power peak problem by 
using choosing a monotonically decreasing fiducial power 
spectrum such as P(k) oc fc -3 . This can hardly be said 
to make the KL-modes less "optimal" , since they will still 
partition the information into mutually exclusive and col- 
lectively exhaustive chunks. They simply become sorted 
according to a different criterion: not by their informa- 
tion content regarding the power normalization, but by 
the physical scale they probe. 

6.3.4. The KL multi-parameter complication 

The derivation of the signal-to-noise eigenmode method 
in e.g. VS96 or TTH does not prove that the compressed 
data set retains the bulk of the information about all pa- 
rameters of cosmological interest, but merely that it is 
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lossless with respect to the overall power spectrum normal- 
ization. In fact, this is not a problem in practice. TTH 
describe a method where one carries out a series of KL 
transforms, optimizing for each parameter of interest in 
turn, then pools all the resulting eigenmodes and uses sin- 
gular value decomposition to eliminate redundancy from 
the pool of modes. In addition, there is good reason to be- 
lieve that the information about all cosmological parame- 
ters is nonetheless preserved even with the signal-to-noise 
eigenmodes alone. We now give a hand-waving argument 
to this effect, and describe some detailed numerical ex- 
periments that support it. TTH computed three separate 
sets of KL modes for the CMB data set of the COBE 
satellite (N = 4016), optimized for measuring three dif- 
ferent parameters: the power spectrum normalization Q, 
the slope n and the reionization parameter r. The 3x3 
Fisher matrix was then computed from the three com- 
pressed data sets separately for N' = 500 modes retained. 
Each set of modes retained virtually all the information 
about their corresponding parameter, but the error bars 
on Q with the n or r modes were substantially larger than 
their Cramer- Rao minimum. That is, the n-modes and the 
t- modes lost information about Q. On the other hand, the 
Q-modes were found to retain virtually all the information 
about n and r. Examination of their window functions re- 
vealed why. To obtain a large lever arm for determining 
the slope, the n-modes were probing mainly the largest 
and the smallest available scales, ignoring those in the 
middle near the "pivot point" . The r-modes were ignor- 
ing the very largest scales, since these are unaffected by 
reionization and therefore carry no information about r. 
The Q-modes, on the other hand, were faithfully probing 
the power on all available scales, and therefore automat- 
ically retained all of the information about n and r as 
well, as a side effect. Thus as long as the galaxy power 
spectrum has no sharp features (which the signal-to-noise 
eigenmodes might potentially ignore) we expect the stan- 
dard KL modes optimized for the normalization to be close 
to lossless with respect to all cosmological parameters af- 
fecting only the power spectrum. 

6.3.5. Does KL bias the results towards our theoretical 

prejudice ? 

It has been argued that the KL method biases the re- 
sults by guessing an a priori fiducial power spectrum when 
computing the pixel covariance matrix. This claim has 
been extensively tested numerically (Bunn 1995, TTH), 
and found to be completely unfounded. In general, the 
effect of guessing an incorrect prior model is to leave the 
estimates unbiased, but with slightly larger error bars than 
what is optimal. If desired, any dependence on the initial 
data can clearly be eliminate d by iterating the KL proce- 
dure as described in Section 4.2, while at the same time 



reducing the error bars close to the minimum allowed by 
the Fisher matrix. 

6.4. Pros and cons of the quadratic method 

One advantage of this method is that its simple and un- 
corrected quadratic estimators have narrow window func- 
tions at all k, even beyond the peak in the power spectrum. 
Thus it is a useful complement to the KL method on the 
very largest scales, as indicated by row 4 in Table 1. 



A second advantage is evident from equation ( f43| ) : Since 
the CPU time for multiplication of a vector by a matrix 
scales as N 2 , the time for computing z scales as TV 2 as well 
if equation ( Jt4| ) is solved by an iterative technique such as 
the conjugate gradient method (Press et al. 1992). This 
is much faster than the KL method, which scales as iV 3 . 
This speed increase may allow the quadratic method to be 
used over a somewhat larger dynamical range than the KL 
method, extending down to smaller scales. The quadratic 
method can also be used as a faster way to obtain the same 
results as the brute force M L m ethod, using the iteration 
scheme described in Section |4.2j . 

Furthermore, we saw in Section |^ and Appendix B that 
the quadratic method can be made immune to various 
sorts of systematic errors which might plague the data. 

An important disadvantage is that, unlike the KL 
method, it does not retain any phase information. This 
is a drawback when estimating the underlying real-space 
power spectrum, since although it can measure this di- 
rectly by computing the appropriate C once the distor- 
tion parameter (3 = f2 6 jb is known, the KL method or 
another linear approach must be used first, to measure /3. 
Once cannot simply immunize the data from mis-estimates 
of (3 using the formalism of Appendix B, since j3 affects vir- 
tually all the modes. It does, however, appear possible to 
generalize the quadratic method to overcome this limita- 
tion (Hamilton 1997, private communication). 

7. DISCUSSION & CONCLUSIONS 

In this section, we summarize our discussion of the pros 
and cons of the various methods, and conclude by describ- 
ing an approach combining the strengths of all of them, 
illustrated in Figure 1. 

We found that although the direct Fourier approach is 
both simple to implement and virtually lossless on scales 
much smaller than the smallest dimension of the sample 
in question, it has several drawbacks on larger scales: 

1. It loses information, giving unnecessarily noisy mea- 
surements. 

2. It is quite tedious to implement numerically if one 
uses the exact expressions we have derived for the 
integral constraint correction, especially for comput- 
ing the covariance. 

In contrast, the two pixelized methods are lossless on large 
scales, but lose small-scale information because numerical 
constraints on the number of pixels limit the dynamical 
range. Since they are simpler to implement as well, they 
allow a more ambitious approach incorporating complica- 
tions such as redshift-space distortions, residual extinction 
and radial selection function errors (Section ^, Appendices 
B and C). The quadratic method can compute exactly the 
same band powers as the KL method, and do so faster (the 
number of operations scaling as the square rather than the 
cube of the number of pixels) , allowing more pixels and a 
larger dynamical range. The KL method, on the other 
hand, is the only one which retains the phase information 
in which clustering anisotropies (differences between the 
angular and radial clustering patterns) is encoded. Since 
redshift distortions and various systematic problems man- 
ifest themselves in this way, the KL method is therefore a 
powerful complement to the quadratic method, since the 
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former can quantify and subtract these systematic effects 
and pass the appropriate redshift-distortion parameter (3 
along to the latter, which can then measure the power 
spectrum directly in real space as described in Appendix 
C. Alternatively, the quadratic method can be immunized 
from such systematic effects, as described in Appendix B. 
The KL method is also useful for cosmographic purposes, 
where spatial information is everything. Finally, the KL 
method also has the advantage of greatly simplifying trou- 
ble spotting such as search for outliers and non-Gaussian 
behavior. 

In conclusion, we have found that although none of the 
methods can be made both feasible and lossless on its own, 
we can obtain a feasible and virtually lossless data analysis 
pipeline satisfying our entire wish list by combining three 
of them, as outlined in Figure 1. 

1. The power spectrum on scales k^ 1 < L/10 is esti- 
mated directly from the raw redshift data with the 
traditional direct Fourier approach. 

2. The raw data is binned into spatial pixels substan- 
tially smaller than L/10, so that this pixelization 
process retains all the information except that which 
was already captured by the traditional method. 

3. The linear (KL) method is used to measure 
anisotropy parameters such as n 6 /b (from redshift- 
space distortions), a residual extinction template 
and corrections to the radial selection function, as 
well as large-scale band powers. 

4. Uncorrelated estimates of the power spectrum on 
scales > L/10 are computed with the quadratic 
method, extending down to even smaller scales if 



N ~ 10 4 — 10 5 pixels are feasible. This can be done 
both by incorporating the systematic effects found in 
Step 3 with the KL method, and by using quadratic 
estimators which are insensitive to these systematic 
effects. The comparison of the results for the band 
powers allows us to quantify how successful we are 
in eliminating these effects. 

5. The entire process may be iterated, using the 
(smoothed) measured power spectrum as the fidu- 
cial one. 

6. Remaining cosmological parameters are estimated 
with a likelihood or % 2 analysis from the power spec- 
trum. 

This approach should allow future redshift surveys to real- 
ize their full potential to constrain cosmological models. In 
the meantime, it appears worthwhile to reanalyze various 
existing surveys with the same pipeline, to eliminate any 
method-induced artifacts and allow more accurate cross- 
comparisons of results. 
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APPENDIX A: SHOT NOISE REMOVAL 

There are two basic approaches in the literature to re- 
moving shot noise. In this Appendix, we show that the 
two give essentially identical results. 

Using the Gaussian approximation 

Let us define write our shot noise corrected band power 
estimator as (qi — bi), where bi denotes our bias correc- 
tion. Equation (^ shows that this estimator will only be 
unbiased if (pi) = bi, where 6j is given by equation (^|). 
A convenient way of removing the noise bias in pixelized 
methods is to choose simply b = bi, since it can be done 
after pixelizing, without ever going back to the individ- 
ual galaxy positions; for a general quadratic combination 
of pixels qi as in equation (Eq) , using equation (Eq) shows 
that the shot noise bias can be written in terms ofpixclized 
quantities alone, as 

6, = tr[E 4 N], (Al) 
where N is given by equation (|l9|). 

The strict minimum variance method 

In the approximation that the shot noise fluctuations 
in the pixels has a Gaussian probability distribution, the 
above-mentioned method of choosing bi — bi is readily 
shown to give the unbiased power estimator with the small- 
est variance. This is an excellent approximation when 
the number of galaxies is large, as we now show. The 
strict minimum-variance method is (Peebles 1980) to sim- 
ply omit self-pairs in equation (0): 



This corresponds to the shot noise correction 
h = I £i(r,r) 



n(r) 3 ^ = ^ Ei(T a ,T a ) 



n{vf 



(A2) 



(A3) 



which is to be compared with equation (^). This is an 
unbiased method since (bi) — bi. How much smaller is 
the variance with this approach? We illustrate this with 
the toy problem of estimating the power at k = in a 
volume-limited survey with N galaxies, which is simply 
proportional to 



q= (N-Nf 



(A4) 



where TV is a Poisson-distributed random variable with 
mean N. Since ((N — N) 2 ) = N, the shot noise correction 
bi = bi corresponds to the choice b = N. The strict min- 
imum variance method of equation (A3) corresponds to 
b = N. Both methods are unbiased, giving (q) = 0. The 
higher order moments differ, however. The variances (q 2 ) 
are 27V 2 and 2N 2 +N, respectively, so whereas the strictly 
optimal method gives the same variance that Gaussian 
noise would, the variance of the other method is a factor 
(1 + 1/27V) larger. [] 

9 The differences between the two methods get larger for higher 
moments, but always remain of the order 1/N. Compared with 



Which method is preferable? 

This means that the optimal method only reduces the 
standard deviation by a negligible 0.01% for N = 10 4 
galaxies. Moreover, it is incorrect to claim that the strictly 
optimal method in some sense removes the "exact" shot 
noise: since the higher order moments depart from the 
Gaussian values even for this method, there is clearly Pois- 
sonian noise left in q. The difference in variance between 
the two methods remain equally negligible for more real- 
istic examples, generally being of order the inverse of the 
number of galaxies in the survey. The choice of which 
method to use should therefore be dictated by practi- 
cal convenience. Whereas the strict minimum variance 
method is of course trivial to implement in techniques in- 
volving an explicit sum over galaxy pairs (such as the FKP 
method), the other method is generally simpler to use for 
pixelized techniques, since it can be implemented using the 
pixelized data alone.[^] 

APPENDIX B: DERIVATION OF INTEGRAL 
CONSTRAINT AND RELATED EXPRESSIONS 

In this appendix, we derive some of the results described 
in the text: the integral constra int c orrection for the tradi- 
tional Fourier method (Section |3.3| ), and the optimal way 
to immunize the data from the untrustworthy modes of 
Section ||. 

The Integral Constraint in the Direct Fourier Method 

How should one deal with the integral cons traint when 
using a traditional method as in Section [3.3| ? From the 
discussion in Section it is clear that we should mod- 
ify the weighting functions of equation (|2|) so that they 
become orthogonal to the mean density, i.e., so that 
§ ipi(r)d?r = 0. There are infinitely many ways of doing 
this, and some are clearly better than others if we want 



the power estimators qi 



to retain as much cosmo- 



logical information as possible. We here derive one such 
correction method which is both simple and intuitive, fol- 
lowing Tegmark (1997c). We will adopt a more ambitious 
approach in the following subsection, deriving the optimal 
correction method for the pixelized case. At the end of 
this Appendix, we show that in the small-scale limit, the 
two methods are in fact identical. 

If we use our guess no (equation |74|) in place of the un- 
known true selection function fi in equation (|l^), we will 

have (xi) = (a — l)^>j(0) ^ 0. When using a traditional 



power estimator qi 



' , this causes a systematic pos- 



itive power bias (a — l) 2 |i/)i(0)| 2 that we cannot subtract 
off, as a is unknown. We must therefore modify ipi so that 
ipi (0) vanishes. Let a denote our estimate of a. We will 
choose a so that this bias vanishes, i.e., so that the integral 

the Gaussian approximation {q 3 ) = 8 TV 3 , the skewness of the two 
methods is up by factors of (1 + 2/N) and (1 + 11/4JV + iy87V 2 ), 
and compared with the Gaussian approximation (q 4 ) = 607V 4 , the 
kurtosis is up by factors of (1 + 12/5JV + 2/15/V 2 ) and (1 + 33/5/V + 
23/127V 2 + 1/60/V 3 ). 

10 There is one useful exception: one can use the strict minimum 
variance method based on pixelized data alone in the special case 
where the pixels are counts in (sharp-edged) cells, in which case the 
terms a; 2 get replaced by Xi(xi — 1). 
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constraint 



n(r) 
dn (r) 



(r)d 3 r = 



holds, or explicitly, 



0) 7 n (r)' 



(r)d 3 r. 



(Bl) 



(B2) 



This is an unbiased estimator of the density normalization, 
since (a) — a, the true value. Substituting S/r) = afio( r ) 
and equations ( p2| ) and (B2) into equation (|l5|, we obtain 



n ( r ). e *k, 



n (r) 
n(r) 



(r)d 3 r - 0(ki)o 



~c/)(r)d 3 



k 4 ) /" n(r) 



(0) 



n (r) 



(r)d 3 r 



n(r) 



a 7 n(r) J n(r) 

where the function ipi is defined by 



V>i(r)d 3 r, 



■ikv -r 



0(0) 



Hence its Fourier transform is 



&(k) = 0(k-k i ) 



(0) 



(r). 



■0(k). 



(B3) 



(B4) 



(B5) 



The relative error in a is of order the inverse square root of 
the number of galaxies in the survey, so we can to a good 
approximation treat a as a known constant fro m h ere on 
and take a/a = 1 on the last line of equation (B3). We 



see that the volume weighting ipi given by equation (B4) 
is better than the traditional choice of equation (|2^) since 

it is orthogonal to the mean, i.e., it satisfies ^i(O) = 0, 
which guarantees that (xj) = 0. 

In practice, we need never use equation ( |B4[ ) to com- 
pute Xi with equation (|l5|), since this is implicitly done 
if we first cor rect n by estimating its normalization with 
equation (B2) and then apply the simple weight function 
of equation (P2|). (This is mathematically equivalent to 
applying the weight function equation (B4) directly to the 
data, using an a rbit rary n-normalization.) However, we do 
need equation (B4) to derive the expressions for the shot 
noise correction and normalization give n in Equations ( J32 ) 
and (|3l|). Substituting equation (B4) into equation ^9) 
gives the shot noise correction 



n(r) 



d A r 



(0) 



^ 3 7 



n(r) 



(B6) 



and expanding the square completes our derivation of 
equation (^2|). The normalization coefficient A; of equa- 
tion d30| ) is determined by the requirement that the win- 
dow function integrate to unity, i.e., A; = J \^i(k)\ 2 d 3 k/ (27r) 3 . 



Using Parseval's theorem, we obtain 



A 



(0) 



(r) 2 d 3 r. 



(B7) 



and expanding the square as above completes our deriva- 
tion of equation (|3l|). 

How important is this correction? 

Let us evaluate the integral constraint correction factor 
Ai for a couple of simple examples. We first note that for 
the special case of equation (|23|), we have 0(r) 2 oc 4>(r). 
Hence a(k) cx 0(k), and equation (J3l|) reduces to 



Aj 



0(k,) 



0(0) 



o(0), 



(B8) 



which we recognize as the result of Park et al. (1994). For 
volume-limited surveys, the prescriptions given by equa- 
tions (23), ( p4| ) and ( pq ) all coincide, so this expression is 
exact for the volume- limited case with these galaxy weight- 
ing schemes. For flux-limited surveys, on the other hand, 
these schemes all give a decreasing weight function ip, since 
n decreases with distance. For the simple Gaussian case 
(p(r) = exp [-(r/R) 2 /2]/ir 1 / 4 R^ 2 , equation © gives 

A t = 1 + e-( Rk ^ - 2e-i( Rk ^ , (B9) 
whereas the approximation ( |B8| ) gives 

A, = 1 -e~ {Rk ' )2 . (BIO) 

A Taylor expansion shows that for kR <C 1, the latter 
overestimates Aj by a factor of two as illustrated in Figure 
2. 



o 




0.01 

Target wave number 



0.02 
k, fl/h 



0.03 



Mpc] 



FIG. 2 — The exact expression for the integral constraint cor- 
rection Ai is plotted together with the approximation of Park et 
al (1994) for a Gaussian weight function tp(r) oc exp[— (r / R) 2 /2], 
R = 100/i -1 Mpc. 
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Eliminating contaminated modes with pixelized methods: 
A Simple Solution 

In this and the next subsection, we continue the dis- 
cussion of Section |5|, and present a method to project out 
modes of the density field that we believe might be con- 
taminated by, e.g., errors in the selection function or in 
the assumed extinction map. Our starting point is equa- 
tion (f78j), the N x M matrix U of untrustworthy modes. 

To remedy the problem, we construct a new "cleaned" 
data set that is independent of a, the coefficients of these 
modes. Let us define 



x = rix', 

where II is an N x N matrix satisfying 



nu = o, 



(Bll) 



(B12) 



i.e., having the columns of U in its null space. This im- 
plies that II has at most rank N — M . We will choose it 
to have exactly this rank, since otherwise II will destroy 
more information than necessary (for inst ance, null ma- 
trix choice II = satisfies equation ( B12 ), but destroys 
all our information). It is easy to construct such matrices, 
a simple choice being 



n = i-u(u t u)- 1 u t . 



(B13) 



This is the Hermitean (IF = II) projection matrix (II 2 = 
II) projecting onto the subspace orthogonal to the columns 
of U. Our corrected data set x satisfies equation (p^), since 
(x) = IlUa = 0. Letting C denote the covariance matrix 
of the uncorrected data set, the corrected data will have 
the covariance matrix 



,t\ 



nc'tf. 



(B14) 



Once x and C have been computed, the rest of the pix- 
eliz ed analysis proceeds just as described in Sections 3.4. 
|3.5| or p^q. The only complication is that C is now sin- 
gular, having rank N — M instead of N . As shown in 
the Appendix of T97, the correct way to deal with this in 
the quadratic method is to replace all occurrences of C _1 
(which is of course undefined) by the "pseudo-inverse" of 
C, defined as 

n[c + 7 uu t ]~ 1 n (bis) 

for some constant 7^0. T97 shows that the result is 
independent of 7, and that a good choice for numerical 
stability is 7 ~ c/N, where c is the order of magnitude 
of a typical matrix element of C. The same trick can be 



used for the KL method, in the step where equation (40) 
is reduced to an ordinary eigenvalue problem by Cholesky 
decomposing C as described in TTH. 



The optimal solution 



M 



Equation (B13) does not give the only rank N 
projection matrix satisfying IIU = — there are in 
fact infinitely many such matrices of the form II = I — 
U(UtMU) _1 U^M, where M is an arbitrary non-singular 
matrix, and equation ( B13| ) simply corresponds to the case 
M oc I. Since they all have the same null space U, it is 
clear that they all destroy the same information (all the in- 
formation about the untrustworthy modes, no more and no 



less). The power spectrum Fisher matrix for the cleaned 
data set is therefore independent of M, so our choice is 
purely one of numerical convenience. For the quadratic 
method in particular, there turns out to be a much more 
appropriate choice than that of equation (B13), which alto- 
gether eliminates the above-mentioned problem of C being 
singular by allowing the quadratic pair weighting E to be 
computed analytically. We will derive this choice of n by 
generalizing the derivation of the quadratic method (T97) 
to include our constraint that the results be independent 
of the corrupted modes. 

The most general quadratic estimator can clearly be 
written as in equation (Eq) for some symmetric matrix 
Ej. As shown in T97, this implies that the variance of qi 
is given by 



V(qi) 



1 



tr [CEjCEi 



(B16) 



and that the signal, the expected contribution to qi from 
the power band of interest, is tr [C,jEj]/2, where C,, is 
defined by equation (^5|) . To maximize the signal-to- noise 
ratio, we want to minimize the variance given a fixed sig- 
nal, i.e., subject to the constraint that tr [C,j Ej] is held 
constant. If we write 



1 



(x + Ua) t E l (x + Ua), 



(B17) 



it is clear that we can phrase our constraint on Ej as 
EjU = 0. This is in fact N x M separate constraint 
equations, so using the fact that Ej is symmetric, our con- 
strained minimization problem involves minimizing 

L = tr |icE 4 CE, - AC, 4 E 4 + A[UA t + AU^eA , 

(B18) 

where A is some arbitrary N x M matrix of Lagrange 
multipliers. Requiring the derivatives with respect to the 
components of E^ to vanish, we obtain 



Ej oc CT 1 [C,i -UA 1 - AU f ] CT 1 , 



(B19) 



where A is determined by the constraint EjU = 0. Defin- 
ing U = C _1 U, the solution is 



A 



I — iu(u t u) _1 u t 



(B20) 



which can be verified by dire ct substitution. Substituting 
this back into equation (B19) finally yields 



E, cx n t c _1 c,iC- 1 ii, 



where 



n 



I - U(U t U)- 1 U t = I - U(U f C 
is a projection matrix satisfying IIU = 



(B21) 



1 u)- 1 u t c- 1 

(B22) 
ITU = and 



C _1 II = IPC -1 . Inserting equation (B21) into equa- 
tion (Eq), we see that our quadratic estimator retains the 
simpleform of equation (^3|) if we generalize equation ( 44 ) 
to 

z = C^IIx, (B23) 
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so after cleaning the data set (replacing x by IIx), the 
quadratic method proceeds exactly as before. This gen- 
eralization of the quadr atic method clearly reduces to the 
prescription in Section |3.6| (z = C _1 x) if there are no 
untrustworthy modes, in which case M — and II = I. 
Note that this technique is quite useful for estimating the 
power spectrum from cosmic microwave background ex- 
periments as well, in which case obvious candidates for 
corrupted modes are the monopole and the three dipole 
components. 

Relation between the pixelized and continuous cleaning 
schemes 

In this section, we will show that the integral correction 
procedure for traditional methods that we derived above 
is lossless for the FKP volume weighting in the small-scale 
limit, corresponding to the quadratic method. 

When we have merely one untrustworthy mode {M = 
1) corresponding to the normalization of n, the matrix 
defined by equation ( |T8| ) consists of a single column vector; 
U = u. Using the continuum pixels x(r) of equation (p4j), 
this vector is simply the k = (constant) mode, i.e., 



u(r) = 1. 



(B24) 



Let us now evaluate the optimal power estima te <fc that we 
derived above, given by Equations fl43| ) and ( B23 ). Since 
U = u, we have 



n x 



utC- x u 



(B25) 



where Equations (|6|) and flB24| ) show that u^C 1 x = 
J 4>(r)x(r)d 3 r and u^C x u = J 0(r)d 3 r = 0(0). We can 
thus write 



z(r) = (C^IIxXr) = J 



5 D (r-r') 



(r) 



Fourier transforming this with respect to r', Equation (69) 
now shows that qi = |xj| 2 /2, where 



= z(ki) = J ^(r)x(r)d 3 r, 



(B27) 



and the volume weighting fun ction ipi is exactly the one 
that we derived in Section B.l, given by equation (B4). In 
other words, we have shown that the FKP choice of the 
funct ion together with the volume weighting of equa- 
tion ( p34j ) is identical to the lossless quadratic method in 
the small-scale limit. 

APPENDIX C: REDSHIFT DISTORTIONS AND 
CLUSTERING EVOLUTION 

In Section and Appendix B, we showed that the pix- 
elized methods allow a more ambitious approach than is 
feasible with the direct Fourier methods, incorporating 
multiple integral constraints, since all the complications 
simply became buried in the appropriate matrices. In this 
Appendix, we show how two additional complications can 
be incorporated with pixelized methods in the same vein: 
rcdshift distortions and clustering evolution. 



Redshift distortions 

A ubiquitous problem with power spectrum estimation 
is that of "redshift distortions" . When estimating the dis- 
tance to a galaxy by its redshift, galaxies receding faster 
than the Hubble flow due to local gravitational interac- 
tions appear to be further away than they really are, and 
vice versa. This was first discussed by Kaiser (1987) in the 
context of P(k), and a recent review is given by Hamilton 
(1997c). Denoting the apparent density field in redshift 
space 5 s (r), Hamilton & Culhane (1996) use Kaiser's for- 
malism to show in linear perturbation theory that 



1 + 



a(r) d 
r dr 



6r, (CI) 



where = tt°' 6 /b, the constant b is the so-called linear 
bias factor, and 



a(r) 



<91nn(r) 
<91nr 



(C2) 



is two plus the logarithmic slope of the radial selection 
function. Fourier transforming this gives (Hamilton & 
Culhane 1996; Hamilton 1997c) 



S s {k) = 5 r (k)+0 / /(k,k')?,(k 



d 3 k' 



(C3) 



where the function / is defined by 



/(k,k') = 
Thus we obtain 



Q t(k'-k)-r 



(k'-rf 



k'r 



d 6 r. 



(C4) 



(6 s (kyS s (k')) = (2tt) 3 / 3 (k,k',k")P(k")d 3 fc", (C5) 



(r')x(r')dV. where 
(B26) 



g{k, k', k") = S D (k - k")5 D {k' - k") + 
+ 0[S D (k - k")/(k', k) + 8 D (k' - k")/(k, k')*] + 
+ /3 2 /(k,k")*/(k',k"). (C6) 

The above expressions are derived and discussed in detail 
by Zaroubi & Hoffman (1996), and also in Tegmark & 
Bromley (1995) and T95 for the volume limited case; see 
Szalay, Matsubara, & Landy (1997) for further discussion. 

The key point here is that although (<5 s (k)*5 s (k')) is no 
longer diagonal, and rather messy, it is still linear in the 
power spectrum. Thus the pixel covariance matrix C will 
still be some shot noise term plus a term linear in P(k). 
In other words, by letting C $ refer to the derivative of C 
with respect to the band powers in real space instead of 
redshift space, the quadratic method will measure the real 
space power spectrum directly (given a priori knowledge 
of (3) , and the corresponding window functions (the rows 
of F, say) will show the contributions to the measurements 
qi from the various real space power bands. 

Clustering evolution 

The density fluctuation field S r maintains its shape in 
linear perturbation theory, simply increasing in amplitude 



21 



22 



by a position- independent growth factor D. Since we are 
seeing distant galaxies at an earlier time, we see the ap- 
parent density fluctuations 

5 a (r) = D(r)S r (r), (C7) 

where D(r) = 1/(1 + z) for fi = 1. This effect is strai ght 



forward to include in a pixelized analysis. Equation (19) 
remains unchanged and equation ( |20| ) simply gets replaced 

by 

S tj = J ^(k)^-(k)*P(fc)^3, (C8) 

where we have defined the functions V^( r ) = D(r)ipi(r). 
This correction is quite small for shallow galaxy surveys, 
where n typically varies dramatically between z = and 
z = 0.2, a range over which D changes by at most about 
20%, less for small 51. If this effect is incorporated into the 
analysis, can be made a free parameter to be fit for in 
the pipeline. 

This clustering evolution should not be confused with 
galaxy evolution, which we do not discuss here, and which 
affects only n, not 6 r . 

APPENDIX D: NORMALIZATION CONVENTIONS 

Unfortunately, the power spectrum P{k) is defined in 
many different ways in the literature, differing by normal- 
ization factors such as (27r) 3 and a fiducial box volume V. 
In this paper, we normalize Fourier transforms as 

/(k) = J f(r)e- ikr d 3 r, (Dl) 

and normalize P(k) so that 

(? r (k)*? r (k')) = (2vr) 3 <5 D (k - k')P(fc). (D2) 

The units of P(k) are volume. With this normalization, 
the dimensionless power A 2 of Peacock & Dodds (1994) is 
given by 

4-7T 

A 2 (fc) = ^fc 3 P(fc), (D3) 

the r.m.s. fluctuations erg in a sphere of radius R = 8/i _1 
Mpc are 



al = 4tt 



sin x — x cos x 



x 3 /3 



where x = kR, and the Sachs- Wolfe quadrupole Q in the 
cosmic microwave background is given by 

Q 2 ^^-C 2 = l 4 r ^P{ k )k 2 dk, (D5) 

47T 7T Z Jq 

where x w 2kc/H a w k x 6000/i _1 Mpc and 

3 sin x — 3x cos x — x 2 sin x 

32{x) = j . (D6) 

x^ 

T95 uses a convention where the (27r) 3 factor in equa- 
tion (p2j) is replaced by (2tt) 6 . 
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Fig. Dl. — We propose analyzing large future galaxy redshift surveys such as the SDSS, using three techniques in conjunction: a traditional 
Fourier approach on small scales, a pixelized quadratic matrix method on large scales and a pixelized Karhunen-Loeve eigenmode analysis to 
probe anisotropic effects such as redshift-space distortions and residual extinction. The horizontal bars in the power spectrum box indicate 
that the quadratic method has a larger dynamic range than the KL method. The bottom of the figure indicates that numbers such as the 
redshift distortion parameter Q o e /b which reflect anisotropic clustering can only be optimally constrained using the KL modes, which retain 
not merely the overall power, but the phase information as well. 
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